metabolic process
- Europe > Finland > Uusimaa > Helsinki (0.04)
- North America > Greenland (0.04)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Health & Medicine > Therapeutic Area > Oncology (1.00)
- Health & Medicine > Therapeutic Area > Endocrinology > Diabetes (1.00)
- Health & Medicine > Surgery (0.95)
- Health & Medicine > Pharmaceuticals & Biotechnology (0.67)
- Europe > Finland > Uusimaa > Helsinki (0.04)
- North America > Greenland (0.04)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Health & Medicine > Therapeutic Area > Oncology (1.00)
- Health & Medicine > Therapeutic Area > Endocrinology > Diabetes (1.00)
- Health & Medicine > Surgery (0.95)
- Health & Medicine > Pharmaceuticals & Biotechnology (0.67)
Combining LLMs and Knowledge Graphs to Reduce Hallucinations in Question Answering
Pusch, Larissa, Conrad, Tim O. F.
Advancements in natural language processing have revolutionized the way we can interact with digital information systems, such as databases, making them more accessible. However, challenges persist, especially when accuracy is critical, as in the biomedical domain. A key issue is the hallucination problem, where models generate information unsupported by the underlying data, potentially leading to dangerous misinformation. This paper presents a novel approach designed to bridge this gap by combining Large Language Models (LLM) and Knowledge Graphs (KG) to improve the accuracy and reliability of question-answering systems, on the example of a biomedical KG. Built on the LangChain framework, our method incorporates a query checker that ensures the syntactical and semantic validity of LLM-generated queries, which are then used to extract information from a Knowledge Graph, substantially reducing errors like hallucinations. We evaluated the overall performance using a new benchmark dataset of 50 biomedical questions, testing several LLMs, including GPT-4 Turbo and llama3:70b. Our results indicate that while GPT-4 Turbo outperforms other models in generating accurate queries, open-source models like llama3:70b show promise with appropriate prompt engineering. To make this approach accessible, a user-friendly web-based interface has been developed, allowing users to input natural language queries, view generated and corrected Cypher queries, and verify the resulting paths for accuracy. Overall, this hybrid approach effectively addresses common issues such as data gaps and hallucinations, offering a reliable and intuitive solution for question answering systems. The source code for generating the results of this paper and for the user-interface can be found in our Git repository: https://git.zib.de/lpusch/cyphergenkg-gui
Geneverse: A collection of Open-source Multimodal Large Language Models for Genomic and Proteomic Research
Liu, Tianyu, Xiao, Yijia, Luo, Xiao, Xu, Hua, Zheng, W. Jim, Zhao, Hongyu
The applications of large language models (LLMs) are promising for biomedical and healthcare research. Despite the availability of open-source LLMs trained using a wide range of biomedical data, current research on the applications of LLMs to genomics and proteomics is still limited. To fill this gap, we propose a collection of finetuned LLMs and multimodal LLMs (MLLMs), known as Geneverse, for three novel tasks in genomic and proteomic research. The models in Geneverse are trained and evaluated based on domain-specific datasets, and we use advanced parameter-efficient finetuning techniques to achieve the model adaptation for tasks including the generation of descriptions for gene functions, protein function inference from its structure, and marker gene selection from spatial transcriptomic data. We demonstrate that adapted LLMs and MLLMs perform well for these tasks and may outperform closed-source large-scale models based on our evaluations focusing on both truthfulness and structural correctness. All of the training strategies and base models we used are freely accessible.
- North America > United States > California > Los Angeles County > Los Angeles (0.14)
- North America > United States > Connecticut > New Haven County > New Haven (0.04)
- Europe > Netherlands > South Holland > Leiden (0.04)
- (4 more...)
ProtT3: Protein-to-Text Generation for Text-based Protein Understanding
Liu, Zhiyuan, Zhang, An, Fei, Hao, Zhang, Enzhi, Wang, Xiang, Kawaguchi, Kenji, Chua, Tat-Seng
Language Models (LMs) excel in understanding textual descriptions of proteins, as evident in biomedical question-answering tasks. However, their capability falters with raw protein data, such as amino acid sequences, due to a deficit in pretraining on such data. Conversely, Protein Language Models (PLMs) can understand and convert protein data into high-quality representations, but struggle to process texts. To address their limitations, we introduce ProtT3, a framework for Protein-to-Text Generation for Text-based Protein Understanding. ProtT3 empowers an LM to understand protein sequences of amino acids by incorporating a PLM as its protein understanding module, enabling effective protein-to-text generation. This collaboration between PLM and LM is facilitated by a cross-modal projector (i.e., Q-Former) that bridges the modality gap between the PLM's representation space and the LM's input space. Unlike previous studies focusing on protein property prediction and protein-text retrieval, we delve into the largely unexplored field of protein-to-text generation. To facilitate comprehensive benchmarks and promote future research, we establish quantitative evaluations for protein-text modeling tasks, including protein captioning, protein question-answering, and protein-text retrieval. Our experiments show that ProtT3 substantially surpasses current baselines, with ablation studies further highlighting the efficacy of its core components. Our code is available at https://github.com/acharkq/ProtT3.
- Asia > Singapore (0.04)
- North America > United States (0.04)
- South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
- (3 more...)
- Overview (0.93)
- Research Report > New Finding (0.67)
- Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
- Education > Health & Safety > School Nutrition (0.69)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.93)
- Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.93)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
Temporal Causal Mediation through a Point Process: Direct and Indirect Effects of Healthcare Interventions
Hızlı, Çağlar, John, ST, Juuti, Anne, Saarinen, Tuure, Pietiläinen, Kirsi, Marttinen, Pekka
Deciding on an appropriate intervention requires a causal model of a treatment, the outcome, and potential mediators. Causal mediation analysis lets us distinguish between direct and indirect effects of the intervention, but has mostly been studied in a static setting. In healthcare, data come in the form of complex, irregularly sampled time-series, with dynamic interdependencies between a treatment, outcomes, and mediators across time. Existing approaches to dynamic causal mediation analysis are limited to regular measurement intervals, simple parametric models, and disregard long-range mediator--outcome interactions. To address these limitations, we propose a non-parametric mediator--outcome model where the mediator is assumed to be a temporal point process that interacts with the outcome process. With this model, we estimate the direct and indirect effects of an external intervention on the outcome, showing how each of these affects the whole future trajectory. We demonstrate on semi-synthetic data that our method can accurately estimate direct and indirect effects. On real-world healthcare data, our model infers clinically meaningful direct and indirect effect trajectories for blood glucose after a surgery.
- Europe > Finland > Uusimaa > Helsinki (0.05)
- North America > Greenland (0.04)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- (2 more...)
- Health & Medicine > Therapeutic Area > Oncology (1.00)
- Health & Medicine > Therapeutic Area > Endocrinology > Diabetes (1.00)
- Law > Alternative Dispute Resolution (0.82)
Latest Machine Learning Research Uncovers a Hidden Order in Scents
Alex Wiltschko is an olfactory neuroscientist for Google Research's Brain Team. He recently employed machine learning to analyze their oldest and least known sense of smell. Their discoveries considerably increased scientists' capacity to determine a molecule's scent from its structure. Over 800 chemicals reach your nose when you smell coffee. Our brains create the general impression of coffee from this chemical image.
AI Model Links Smell Molecules With Metabolic Processes
Alex Wiltschko began collecting perfumes as a teenager. His first bottle was Azzaro Pour Homme, a timeless cologne he spotted on the shelf at a T.J. Maxx department store. He recognized the name from Perfumes: The Guide, a book whose poetic descriptions of aroma had kick-started his obsession. Enchanted, he saved up his allowance to add to his collection. "I ended up going absolutely down the rabbit hole," he said.
- Health & Medicine (1.00)
- Education > Health & Safety > School Nutrition (0.41)
Inferring Microbial Biomass Yield and Cell Weight using Probabilistic Macrochemical Modeling
Paiva, Antonio R., Pilloni, Giovanni
Growth rates and biomass yields are key descriptors used in microbiology studies to understand how microbial species respond to changes in the environment. Of these, biomass yield estimates are typically obtained using cell counts and measurements of the feed substrate. These quantities are perturbed with measurement noise however. Perhaps most crucially, estimating biomass from cell counts, as needed to assess yields, relies on an assumed cell weight. Noise and discrepancies on these assumptions can lead to significant changes in conclusions regarding a microbes' response. This article proposes a methodology to address these challenges using probabilistic macrochemical models of microbial growth. It is shown that a model can be developed to fully use the experimental data, greatly relax the assumptions on the cell weight, and provides uncertainty estimates of key parameters. These capabilities are demonstrated and validated herein using several case studies with synthetically generated microbial growth data.
- North America > United States (0.04)
- Europe > Spain > Galicia > Madrid (0.04)
Understanding metabolic processes through machine learning
Bioinformatics researchers at Heinrich Heine University Düsseldorf (HHU) and the University of California at San Diego (UCSD) are using machine learning techniques to better understand enzyme kinetics and thus also complex metabolic processes. The team led by first author Dr. David Heckmann has described its results in the current issue of the journal Nature Communications. The synthetic life sciences rely on a detailed and quantitative understanding of the complex systems in biological cells. Only if such systems are understood is their targeted manipulation possible. A system already well known is biological metabolism, in which many hundred enzymes are involved.
- Europe > Germany > North Rhine-Westphalia > Düsseldorf Region > Düsseldorf (0.33)
- North America > United States > California > San Diego County > San Diego (0.30)
- Health & Medicine (0.95)
- Education > Health & Safety > School Nutrition (0.66)